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Abstract. The need for small area estimates is increasingly felt in both 
the public and private sectors in order to formulate their strategic plans. 
(^ • It is now widely recognized that direct small area survey estimates are 

^N , highly unreliable owing to large standard errors and coefficients of vari- 

ation. The reason behind this is that a survey is usually designed to 
achieve a specified level of accuracy at a higher level of geography than 
that of small areas. Lack of additional resources makes it almost im- 
C*^ ' perative to use the same data to produce small area estimates. For ex- 

ample, if a survey is designed to estimate per capita income for a state, 
the same survey data need to be used to produce similar estimates for 
W ■ counties, subcounties and census divisions within that state. Thus, by 

necessity, small area estimation needs explicit, or at least implicit, use 
of models to link these areas. Improved small area estimates are found 
^ ■ by "borrowing strength" from similar neighboring areas. 

CZ3 I The key to small area estimation is shrinkage of direct estimates 

toward some regression estimates obtained by using in addition ad- 
ministrative records and other available sources of information. These 
shrinkage estimates can often be motivated from both a Bayesian and 
^f^ ■ a frequentist point of view, and indeed in this particular context, it 

CN I is possible to obtain at least an operational synthesis between the two 

^'l ' paradigms. Thus, on one hand, while small area estimates can be devel- 

cn . oped using a hierarchical Bayesian or an empirical Bayesian approach, 

similar estimates are also found using the theory of best linear unbi- 
ased prediction (BLUP) or empirical best linear unbiased prediction 
(EBLUP). 
The present article discusses primarily normal theory-based small 
KN ■ area estimation techniques, and attempts a synthesis between both the 

j^ I Bayesian and the frequentist points of view. The results are mostly 

discussed for random effects models and their hierarchical Bayesian 
counterparts. A few miscellaneous remarks are made at the end de- 
scribing the current research for more complex models including some 
nonnormal ones. Also provided are some pointers for future research. 

Key words and phrases: Area-level models, BLUP, confidence inter- 
vals, EBLUP, empirical Bayes, hierarchical Bayes, mean squared error, 
multivariate, second-order unbiased, unit-level models. 
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1. INTRODUCTION 

Small area estimation has become a topic of grow- 
ing importance in recent years. The need for such 
estimates is increasingly felt in both the public and 
private sectors in order to formulate their strategic 
plans. For instance, to address emerging or exist- 
ing social issues, many national governments have 
passed laws that require production of reliable and 
up-to-date small area estimates on a regular basis. 
As an example, in the early 1990s, the U.S. Congress 
passed a law requiring the Secretary of Commerce to 
produce and publish, at least every two years, start- 
ing in 1996, current small area estimates related to 
the incidence of poverty for states, counties, local 
jurisdictions of governments and school districts. In 
the private sector, businesses, especially the smaller 
ones, make decisions based on local income, popu- 
lation and environmental data to evaluate markets 
for new products and to determine areas for the lo- 
cation, expansion and contraction of their activities. 

Small areas may refer to small geographical areas 
such as counties, subcounties, census tracts, etc. Al- 
ternately, they may also refer to small domains cross- 
classified by age, sex and other demographic char- 
acteristics. Other than "small areas" and "small do- 
mains," often the terms "local areas," "subdomains" 
and "substates" are used interchangeably. Through- 
out this article, we will use the term "small area," 
possibly the most popular usage of the term, espe- 
cially in survey sampling. 

Shrinkage estimators have even a longer history 
than small area estimators. An exact definition of 
these estimators is hard to come by. Lemmer (1988) 
in his Encyclopedia of Statistical Sciences article 
characterized shrinkage estimators as ones obtained 
through modification of some standard estimators, 
for example, maximum likelihood estimator (MLE), 
uniformly minimum variance unbiased estimator 
(UMVUE), least squares estimator, etc., in order 
to minimize some desirable criterion such as mean 
squared error (MSE), quadratic risk, bias, etc. With 
these objectives in mind, shrinkage estimators can 
be interpreted in a very broad sense. In particular, 
the best linear unbiased predictors (BLUP's), em- 
pirical best linear unbiased predictors (EBLUP's), 
empirical Bayes (EB), hierarchical Bayes (HB), and 
possibly a host of other estimators fall within this 
general category. One common feature of all these 
estimators is that they are usually weighted aver- 
ages of one of the aforementioned standard estima- 
tors and some other estimator reasonable under an 



appropriate model. Weights to these estimators are 
determined with the objective of meeting some "op- 
timality" criterion. 

Shrinkage estimates have a natural place in small 
area estimation where direct estimates such as the 
MLE, UMVUE, etc., are usually unreliable owing to 
large standard errors and coefficients of variation as- 
sociated with them. The reason behind this is that 
the original survey was targeted to achieve accuracy 
at a higher order of aggregation than that of small 
areas. Due to limited resources, the same survey 
data need to be used for producing small area esti- 
mates. This necessitates "borrowing strength" from 
similar other small areas with the objective of "in- 
creasing the effective sample size" in order to obtain 
estimates of increased precision. 

The early small area estimators achieved this ob- 
jective by shrinking the area-specific direct estima- 
tors (e.g., county-specific averages) toward some over- 
all estimator (e.g., the state average). Later, with 
the availability of auxiliary information from ad- 
ministrative records and other sources, the direct 
estimators are now usually shrunk toward some es- 
timated regression surface. This shrinking process 
needs explicit (or at least implicit) use of models. 

Bayesian estimators have been in existence for 
more than two centuries. Very often, they can be 
regarded as shrinkage estimators, shrinking, for ex- 
ample, the sample mean toward the prior mean. The 
BLUP and EBLUP estimators developed by Hen- 
derson (1953) for mixed linear models are also gen- 
uine shrinkage estimators, shrinking the direct esti- 
mators toward some regression estimators. However, 
as the title of this special issue suggests, the name 
"shrinkage" possibly was coined with the seminal 
paper of Stein (1956). Stein introduced shrinkage 
estimators to estimate a multivariate normal mean 
vector and proved under the sum of squared error 
loss their domination over the sample mean vec- 
tor in three or higher dimensions. He gave a purely 
decision-theoretic motivation of his result, and was 
implicitly considering a balanced one-way ANOVA 
model for random effects. The original result of Stein 
involved shrinking the sample mean toward some 
guessed value of the population mean. Later exten- 
sions of his ideas due to Lindley (1962) and Stein 
(1962) led to shrinkage toward an overall average, 
and more generally to a regression surface, still with 
balanced data. Stein's estimators gained immense 
popularity in the 1970s when Efron and Morris, in 
a series of articles, gave interesting EB interpreta- 
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tion of these estimators (see, e.g., Efron and Mor- 
ris, 1973). A pioneering extension of Stein's ideas in 
the small area estimation context is due to Fay and 
Herriot (1979) in their highly referred article. The 
paper showed how Stein-type results (without nec- 
essarily the exact dominance consideration) could 
be extended to unbalanced random effect regres- 
sion models with tremendous potential for applica- 
tion. 

It is near impossible to cover all aspects of small 
area estimation in a single review article. Our pri- 
mary focus will be on one-way random effects regres- 
sion models, and connecting the ideas of BLUP and 
EBLUP with HE and EB estimators. These mod- 
els are usually referred to in the small area litera- 
ture as "area-level" models where one begins with 
some small area summary statistics, and tries to im- 
prove on these estimators by shrinking them toward 
some regression surface. This is in contrast to the so- 
called "unit-level" models where one has data avail- 
able for the sampled units within a small area. We 
will barely touch upon the latter. Another compo- 
nent of research which has received scant attention 
in the small area literature is the development of EB 
confidence intervals. We will discuss this topic also 
at some length. For a detailed exposure to small area 
estimation, the reader is referred to the recent book 
of Rao (2003a) and the review articles of Ghosh and 
Rao (1994), Pfeffermann (2002), Rao (1999, 2003b) 
and Datta (2009). 

The outline of the remaining sections is as follows. 
In Section 2, we discuss balanced one-way random 
effects regression models, and discuss the connec- 
tion between the BLUP's, EBLUP's, HB, EB, and 
in particular, the Stein-type shrinkage estimators. 
Section 3 extends these results to unbalanced one- 
way models, and compares and contrasts both HB 
and EB estimators in this setup. MSE approxima- 
tion of small area estimators is also discussed in this 
section. Section 4 discusses multivariate small area 
shrinkage estimators, and discusses one particular 
application related to adjustment of census counts. 
Section 5 discusses EB confidence intervals for both 
balanced and unbalanced data. Section 6 gives a 
brief account of unit-level models for small area es- 
timation. Section 7 contains a few other small area 
models such as measurement error models and gen- 
eralized linear models. This section contains also a 
discussion of balanced loss functions in the context 
of small area estimation. Section 8 contains a sum- 
mary of the results presented, and provides a few 
pointers toward topics for future research. 



2. SHRINKAGE ESTIMATORS 
FOR BALANCED DATA 

The primary objective of this section is to intro- 
duce shrinkage estimators of small area means under 
different paradigms, and point out the interrelation- 
ship between them. The corresponding uncertainty 
measures are also compared. We begin with the fol- 
lowing model. 

Let yi (i = 1, . . . ,7Ti) denote the area-level survey 
estimators for the m small areas. Consider the model 

yilOi'^^ Ni9i,V), and 
(2.1) 

OilA'^^ N{x[p,A), i = l,...,m. 

In the above xi , . . . , x^ are p-dimensional design 
vectors and /3 {p x 1) is the unknown regression co- 
efficient. Writing 9i = xf /3 + Ui (i = 1,. . . ,m), it is 
easy to reexpress (2.1) as a random effects model 
with 

x[(3 + Ui + ei, 



Vi 



1, 



,m, 



(2.2) 

where the Ui and the Cj are mutually independent 
with Ui '~"'- N{<d,A) and the a *~ A^(0, V). Further, 
writing X = (xi, . . . ,Xm)^, y = {yi,...,ymf , u = 
(ui, . . . ,Um)'^ and e = (ei, . . . ,6^)^, one can rewri- 
te (2.2) in matrix notation as 

(2.3) y = X/3 + u + e. 

We assume rank(X) = p{< m). Noting that margi- 
nally, y ~ N{X.p, {V + A)Im), where Im is the iden- 
tity matrix of order m, it is clear that we encounter 
an identifiability problem when both V and A are 
unknown. The problem does not occur in a unit- 
level model when one can find a separate estimate 
of V by utilizing the unit-level data. However, this 
option is unavailable in an area-level model, where 
it is customary to assume a known V . In practice, V 
is a sort of smoothed estimate, for example, using 
the generalized variance function approach; see, for 
example, Wolter (1985) or Otto and Beh (1995). 

First assume A(> 0) is known. We begin with the 
HB model with the prior it{(3) = 1. Then we have 
the following theorem. 

Theorem 1. Under the given model, the poste- 
rior distribution of 9 is A^((l — i?)y-|-i?Pxy, V{{1 — 
B)lm + BPx.)), where B = V/{V + A) and Px = 

x(x^x)-ix^. 

Proof. The result follows by noting that 6\/3, 
y ~ Niil - B)y + BX(3,V{1 - B)I^) and /3|y ~ 
iV((X^X)-iX^y,(y + ^)(X^X)-i), and then us- 
ing the formulas for iterated expectation and vari- 
ance along with normality of the conditionals. D 
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Remark 1. It follows from the above theorem 
that the posterior mean given by 

(2.4) e"" = E{e\y) = {1 - B)y + BPxy 

is a weighted average of the direct estimator y and 
the regression estimator Pxy = X/3, where (3 = 
(X • X)~^X y. It is easy to check that the weights 
are inversely proportional to the sample variance 
and the prior variance. Thus 0^ shrinks the direct 
estimator y of to the regression estimator Pxy 
of 9, where the amount of shrinking depends on the 
ratio V/A. In the limiting cases when i? — )• (i.e., 
when V <^ A) or B ^ 1 (i.e., when V > A), 6^ 
tends respectively to the direct estimator y and the 
regression estimator Pxy, quite in keeping with 
one's intuition. Later, in Theorem 2, we will mo- 
tivate the estimator in (2.4) as the BLUP of with- 
out any distributional assumption. We also point 
out that this is also the best unbiased predictor un- 
der normality. 

Remark 2. It is also important to note that if the 
parameter (3 were also known, the posterior variance 
of e would be V{1 - B)Im. Thus the term VBPx 
in the posterior variance in Theorem 1 can be inter- 
preted as the additional posterior uncertainty due 
to unknown (3, but known A. We will examine later 
in this section the effect of an unknown A as well on 
the posterior variance. 

Next we show that the estimator of 6 given in (2.4) 
can be motivated without any distributional assump- 
tion but using only the first two moments. The fol- 
lowing theorem proves that this estimator is a BLUP, 
that is, it has the smallest mean squared error (MSE) 
within the class of all linear unbiased estimators 
(predictors) of 6. Also, the MSE equals the posterior 
variance given in Theorem 1. 

Theorem 2. The estimator 6^ of 6 given in (2.4) 
is the BLUP of e. Also, E[{e^ - 0)(0^ - 6)^] = 

y{(i-s)i^ + sPx}. 

Proof. Since E{Y) = X/3, any linear unbiased 
predictor C Y + hoie = X/3 + u must satisfy CX/3 + 
b = X/3 for all /3. That is, b = 0, and CX = X, or 
equivalently, CPx = Px- For such a predictor CY, 
since CY - 6 = {C - I^)u + Ce, 

E[{CY - e){CY - 0f] 

= A(c-i^)(c-i„)^ + ycc^ 

= {V + A)CC^ - A{C + C^) + Al^ 
(2.5) 



= V{l-B)lm 

+ {V + A){C-{l-B)l^} 

■{C-{l-B)lm]^. 

Now subject to the condition CPx = Pxj it can be 
shown that 

{C-(l-i?)I^}{C-(l-i?)I^}^ 
= {C - (1 - S)I„ - 5Px + 5Px} 
(2.6) ■{C-{l-B)lm-BV^ + BV^Y 

= {C - (1 - B)lm - BPx} 

• {C - (1 - B)lm - SPx}^ + S'Px. 

Note that C = (1 — B)lm + i^Px satisfies the condi- 
tion CPx = Px and this choice minimizes -E[(CY — 
0)(CY - eY]. Thus the BLUP of 6> is given by 6^. 
Also, from (2.5) and (2.6), it follows that the mean 
squared and product matrix of prediction error of 
theBLUP is T/{(l-S)I„ + SPx}. □ 

Remark 3. Under normality of u and e, the 
BLUP 6 of 6 is also the best unbiased predictor 
of 0; that is, among all unbiased predictors of 0, 
has the least mean squared error. 

Theorems 1 and 2 establish the equivalence of the 
BLUP and the HB predictor and also of the cor- 
responding uncertainty measures for the balanced 
one-way random effects model when the parame- 
ter A is known. Indeed, the result is also true for the 
general mixed effects model (see, e.g., Datta, 1992). 
However, this algebraic equality does not quite hold 
for unknown A, or equivalently unknown B. 

To see this, we will consider separately, the EBLUP 
(or EB) and HB estimators, and point out where 
the differences occur. For the given random effects 
model, y ~ iV(X/3, {V + A)!^), which in the Baye- 
sian terminology, is the marginal distribution of y 
after integrating out 6. Based on this marginal pdf, 
{j3,S = ||y — X;9|p) is minimal sufficient for {(3, A). 
Noting that S r^ {V + A)xi,-p, the UMVUE of B = 

V/{V + A) is given by B^^ = V{m -p- 2)/S for 
m > p + 2. The corresponding EB or EBLUP esti- 
mator of 6 is then given by 



e 



EB 



e 



EBLUP 



(2.7) 



V{m — p — 2) 
S 



^Vim-p-2)^^^ 



the James-Stein estimator (James and Stein, 1961). 
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One criticism of the above EB or EBLUP estima- 
tor is that the estimator B^^ oiB can assume values 
bigger than 1 with positive probabihty. The resulting 
EB or EBLUP estimator then pulls the direct es- 
timator y toward the opposite direction of the re- 
gression estimator Pxy- Replacing B by (S )"*", 
where (B^^)^ = ioain{B^^ , 1), the positive part Stein 
estimator rectifies the problem. However, it was 
shown by Datta et al. (2002) that P{B^^ > 1) goes 
to zero at an exponential rate for large m. So, the 
estimator B^^ is usually quite adequate even for 
moderate m. 

In contrast, with the alternative fully Bayesian ap- 
proach (Morris, 1983a), if one assigns the prior 7r(/3, 
^) = 1 so that tt{P,B) = B-^, one gets 7r{e\B,y) 
the same as given in Theorem 1 for a known B, but 
needs in addition 

Tr{B\y) oc B^^^-P^^^ exp(-^Bs]B^^I[0 <B<1] 

= 5('"-P-^)/2gxp| — L55 J/[0 <B<1]. 

Here, for the sake of simplicity and to present Mor- 
ris's results, we have considered only a uniform prior 
for A. It is certainly possible to consider other pri- 
ors, including inverse gamma priors with appropri- 
ate shape and scale parameters of the inverse gamma 
distribution, so long as the resulting posterior is 
proper. A prior of the form '7r(/3, A) = A~^ will yield 
a proper posterior provided fc < 1 and m>p — 2k + 2. 
Thus, while the uniform prior Tr{f3,A) = 1 yields 
a proper posterior when m> p + 2, the priors A~^ or 
A~'^ will always yield improper posteriors. For the 
uniform prior, the posterior mean of 9 is now ob- 
tained by replacing B in Theorem 1 with E{B\y), 
while y(0|y) = y[(l - E{B\y))I^ + E{B\y)P^] + 
V{B\y){y - Pxy)(y - i^xy)^- Thus, other than the 
replacement of B by E(B\y) in the variance formula 
given in Theorem 1, the additional uncertainty due 
to estimation of B is also incorporated in this vari- 
ance formula. 

Integrating by parts, one can show that for large m, 
E{B\y) can be approximated by {m — p — 2)V/S (cf. 
Theorem 1 of Datta and Ghosh, 1991a). Similarly, 
V{B\y) can be approximated by 2{m — p — 2)V/S'^. 
With these approximations, E{0\y) is approximated 
by 0^^ , while V{6\y) can be approximated as V{1 — 

(r,^^)j^^ + (H^^P^P^ + 2(m.-g-2)y^ (y _ 

^/3)(y — X/9) . These results agree with Morris' 
(1983b) intuitive approximations for E{6\y) and 



V{0\y) for the special case of intercept model. In 
addition, if instead of the posterior mean, one es- 
timates B by its posterior mode, one gets the esti- 
mator B^^^ = min((?7T- — p — 4:)V/S, 1), which leads 
to an estimator of 6 quite akin to the positive part 
James-Stein estimator, the only difference being that 
m — p — 2 is now replaced hy m — p — 4. 
It is instructive to find the Bayes risk of 6^^ under 



squared error loss L{6,a) 
theorem is proved. 



|0 — a|| . The following 



Theorem 3. Let m> p + 2. Then writing ha 
xf (X^X)-ixi for all i: 

(a) 
E[{9, - ef^)f = V{1 -B) + VBhu + .^^^^^ " ^' 



m — p 



(b) 



'EB 



E\\e-e f = V[m-{m-p-2)B]. 



Proof. Let B = V{m-p- 2)/S. If Oi = E[ei\(5, 
A, y], then Oi = yi- B{yi - xj(3) and V[e,\(3, A, y] = 
V{1 — B). Using iterated expectation it follows that 

(2.8) E[{e, - ef^)f = v{i -B) + E[{ei - ef'')f. 

Using the expressions of 6i , Of^ , and independence 
of /3 and y — X/3, it follows that 

= E[{B^J CP - fi)f] 
(2.9) 

+ E[{B-B)\m-^J(3f] 

= VBhii + E[{B - Bfivi - xf^)2], 

where ha = x?'(X^X)~-'^Xj. By Basu's theorem, S 
and [ni — :x.J /3)'^/S are independently distributed 
(see Ghosh, 1992a). Then 

E[{B-B)\yi-^Jpf] 
(2.10) 

= E[SiB-Bf]E[{y,-^Jpf/S]. 

By a simple calculation E[S{B - B)"^] = 2VB. Also, 
by the independence of S and {yi — xJ"/3)^/<S', 

E{y,-^fM' 



E[iy,-^iPy/S] 



(2.11) 



E{S) 

{ayB){l-hu) 
{a'^/B){m-p) 

l-hji 
m — p 
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Combining (2.8)-(2.11), one gets (a). Summing 
both sides of (a) over i, and noting ^'^i ha = 
tr[(X^X)-i • (X^X)] =p, one gets (b). D 

Remark 4. It is interesting to observe that a com- 
parison of Theorem 3 with Theorem 1 (or Theo- 
rem 2) reveals that the excess Bayes risk due to es- 
timation of the unknown variance component A is 
simply 2VB. It is easy to see from Theorem 1 or 2 
that the Bayes risk with known A \s V[m{l — B) + 
pB]. 

Remark 5. Another interesting observation from 
Theorem 3 is that an unbiased estimator of the MSE 
is Vim g — —] which is simply Stein's un- 
biased estimator. While this is in agreement with 
equation (1.18) of Morris (1983b), our expression 
for the component MSE given by part (a) in Theo- 
rem 3 agrees with equation (1.16) of Morris (1983b) 
only in the special case of an intercept model, that 
is, when Oi = ii + Ui (i = 1, . . . , m). We believe that 
this is due to an oversight in Morris (1983b) in the 
derivation of the component risk for the general re- 
gression model. 

We will now see how the above results can be gen- 
eralized with unequal numbers of observations in the 
different small areas. 

3. SHRINKAGE ESTIMATORS FOR 
UNBALANCED DATA 

The equal sampling variance scenario considered 
in the previous section hardly arises for small area 
problems, where sampling variances for small areas 
are almost always unequal. A widely used area-level 
model first introduced by Fay and Herriot (1979) is 
given by 

(3.1) yi\ei'^^ NiOi.Vi), ei'^^N{^J(3,A). 

Clearly the above model can be viewed also as a ran- 
dom effects model as shown in the previous section. 

Fay and Herriot used the above model for estimat- 
ing the per capita income (PCI) for small places in 
the United States with population less than 1000. In 
their case, yi is the logarithm of per capita income 
for the ith small area. The auxiliary variables con- 
sidered were logarithms of the PCI for the associated 
counties, tax return data, data on housing from the 
previous decennial census. The Fay-Herriot method 
was adopted by the U. S. Bureau of the Census to 
provide updated PCI estimates for small areas. 

Fay and Herriot adopted an EB approach in their 
analysis. Write G = Diag(VL, . . . , Vm), D = G + AIm, 



B = GD-i = D-iQ = Biag{Bi,...,B.m), where 
Bi = Vi/{Vi + A),i = 1, . . . ,m. First, assuming /3 and 
A to be both known, the Bayes estimator of is 
6 = (Im — B)y + BX/3. In order to estimate (3 
and A as needed in an EB approach, first observe 
that for A known, the generalized least squares es- 
timator of /3 is 

P{A) = (X^D~^X)~^X^D~V 
(3.2) 

= [X^(I„, - B)X]-iX^(I„ - B)y, 

where we assume, as before, rank(X) =p{< m). We 
may note here that the corresponding BLUP esti- 
mator of is (Im — B)y + BX/3(A). In order to es- 
timate A as well, Fay and Herriot (1979) and Datta, 
Rao and Smith (2005) used the moment identity 
given by E[YZl{y^-^JhA)? /{Vi + A)]=m-p. 
Dropping the expectation from the left-hand side we 
get 



(3.3) Y.{y,-^J~p{A)Y/{V, + A) 



m — p. 



i=l 



Since the expression in the left-hand side of (3.3) 
is a nonincreasing function of yl, if this expression 
evaluated at A = is less than m — p, there will be 
no solution to the above equation. In this case, the 
estimate is taken to be zero. In the other case, taking 
an initial guess at A and solving (3.2) and (3.3) it- 
eratively, one finds the estimators A and = P{A). 
The resulting EB or EBLUP estimator of 6 is given 
by 



(3.4) 



e 



EB 



B)y + BX/3, 



where B = Diag(yi/(yi + i), . . . , V^/iVm + A)). 

Morris (1983b) provided a general discussion of 
the EB approach in this case with the same pre- 
scription for estimation of f3 and A. An alternative 
HB formulation analogous to the one in Section 2 is 
given by Ghosh (1992a) who also explored an inter- 
relationship between the EB and the HB procedures. 
The HB model is given by 



ind 



yi\ei,l3,A'!^^N{di,Vi), 

(3.5) ei\f3,A''rt'N{^J(3,A), 

i = l, . . . ,m, 7r{(3,A) = l. 
Then the joint posterior density is 

7r{e,P,A\y) 

(3.6) ocA-'"/2exp[-i{(y-6>)^G-i(y 



+ A-^e-Xf3f}]. 
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Then one gets 6'|/3,a,y ~ iV[(I^ - B)y + BX/3, 
G(I™-B)],/3|Ay~iV[/9(A),^{X^(Im-B)Xri], 
where p{A) = [X^(I^ - B)X]-iX^(I^ - B)y. The 
marginal posterior of A is 



(3.7) 

m 



1/2 



exp 



My) 



where Q{y) = A-'[J:Z,{1 - B,)yf - {j:T=ii^ " 
It follows now that 



(3.8) E{e\y) 



E{B\y)]y + E[BH^y\y], 



Vie\y)=E[{I„,-B}G\y] 
(3.9) +E[{B{Ira-Hx)}G\y] 

+ V[B{y-XP{A)}\y], 

where Hx = X[X'^(I„, - B)X]-iX^(I^ - B). Nu- 
merical integration involving one-dimensional inte- 
grals needs to be carried out for evaluating both 
E{6\y) and V{0\y). In the special case, when Vi = 
■ ■ ■ = Vm , these expressions simplify to the ones ob- 
tained in the previous section. This is because in this 
special case, I^ — B = (1 — B)Im- We may also reem- 
phasize that the first component in the right-hand 
side of (3.9) is the posterior variance when both (3 
and A are known. The second term provides addi- 
tional uncertainty due to unknown /3 but known A. 
The third term accounts for additional uncertainty 
due to unknown A as well. 

In the Bayesian framework, posterior variances are 
the natural uncertainty measures. In the frequentist 
approach, a naive method is to substitute A by some 
suitable estimator in the mean squared prediction 
error formula for the BLUP [cf. (2.5) and (2.6) for 
the balanced case]. In the unbalanced case, the mean 
squared and product matrix of prediction error for 
the BLUP is given by the sum of the first two terms 
in the right-hand side of (3.9) without the condi- 
tional expectation operator. As it appears, this will 
miss the third component as it will not account for 
uncertainty due to estimation of A. This results in 
an underestimation of the true MSE of the EBLUP. 

To account for the error in estimating A, follow- 
ing an earlier work of Kackar and Harville (1984), 
Prasad and Rao (1990) considered the MSE of the 
EBLUP. Unlike in the balanced case of Section 2, 



there is no closed- form expression of this MSE. They 
obtained an asymptotic expression of the MSE which 
is accurate to the order o{m~^). This approximation 
is based on an orthogonal decomposition of the MSE 
E{6f^ — Oi)^ . Specifically, they used the decompo- 
sition 

Of"" -Oi = {(1 - Bi)yi + S,xf/3 - Oi} 

+ {Of - (1 - Bi)yi - Bi^llS} 

the first component is always orthogonal to the sec- 
ond and the third components. The orthogonality of 
the last two components holds only for certain spe- 
cific estimators of ^. It is necessary that these esti- 
mators are translation invariant under the transfor- 
mation 5c (y) which maps y to y -|- Xc and are even 
functions of y. In particular, the AN OVA estimator 
in Prasad and Rao (1990), the ML and the REML 
estimators considered in Datta and Lahiri (2000), 
and the method of moment estimator due to Fay 
and Herriot (1979), Morris (1983b) and Datta, Rao 
and Smith (2005) all satisfy these conditions. For 
these estimators of ^4, it follows that 

E{ef^ - e,f 

= gii {A) + g2i{A)+g:ii{A) + o{m-^), 

where gu{A) = Vi{l - Bi), g2i{A) = BfA^f ■ 
{E7=i(l - S,)x,xJ}-ix, and gs,{A) = V?{A + 
Vi)~^\Siv[A). The derivation of the third term is ba- 
sed on a second-order Taylor expansion [i.e., retain- 
ing up to the 0{m^^) term] of E[Bi{yi — :x.f^{A)) — 
Bi{yi — x?'/3(A))]^. This derivation requires also or- 
thogonality of (3 and A in the Fisher ian sense, that 
is, block diagonality of the relevant components of 
the Fisher information matrix. An intuitive estima- 
tor, say, mse {A), of the MSE in (3.10) is given by 

(3.11) mse^{A) 



(3.10) 



--gli{A)+g2^{A) + g3i{A). 



In view of the fact that E[gii{A)] = gu{A) —g3i{A) + 
o{m~'^), and g3i(A) is 0{7n~^), the above estimator 
is not second-order unbiased. Based on the ANOVA 
estimator of A, say, Apu, which is second-order un- 
biased for A, Prasad and Rao (1990) showed that 
the estimator 

(3.12) mse^{ApR) = gipRi + g2PRi + gspm 

is second-order unbiased in the sense that 

E[mse^{App)] = MSE{ef^) + o{m-^), 
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where 
giPRi = 9ii{ApR) + gsii^PR) 

93PRi = 93ii^PR)- 



92PRi =g2i{Apji), 



See Harville (1990) for similar results for mixed lin- 
ear models. In the small area context Datta and 
Lahiri (2000) showed that the expression in (3.12) 
based on the REML estimator of A is also second- 
order unbiased. Second-order unbiased estimator of 
the MSE of the EBLUP using the ML estimator and 
Fay-Herriot estimator of A are given in Datta and 
Lahiri (2000) and Datta, Rao and Smith (2005), re- 
spectively. For further discussion we may refer to 
Rao (2003a) and Datta (2009). 

The posterior variance of 9i, on the other hand 
[see (3.9)], is given by 

V{ei\y) = V^{l-E{Bi\y)] 

( m 

+ E BJA^j\Y,{l-B,)^,^] 



X,; 



= 1 



(3.13) 



+ V[B,{y,-^iP{A)]\y] 

E[gu{A)\y\+E[g2i{A)\y\ 
+ V[B,{yi-i^J~P{A)]\y] 
= giHBi + 92HBi + 93HBi (say) . 

Morris (1983b) provided an approximation to the 
HB estimator E(9i\y) and the associated posterior va- 

- M 

riance. Denoting Morris' point estimator of 6i by 9i , 



(3.14) e,^^=(l-Sf)y, + i?f(x//3), 

where 5f = {{m - p - 2)/(m - p)){Vi/{Vi + A)), 
and f3 and A are obtained by solving (3.2) and (3.3) 
iteratively. It can be checked that (3.2) and (3.3) 
are equivalent to Morris' (1983b) equations (5.2) 
and (5.4). Morris (1983b) approximated the pos- 
terior variance by s'f^j, given by s'fj^^ = CiM + ViM, 
where enM = Qum + 92iM, ViM = gsiM with 



(3.15) 



guM 



gsiM 



M-i 



V^[l-B, 



g2iM - 



vSr^li 



V + A 



m ■ 



p-2 Vi + A 

xf [x^(v + ii)-ix]-ix,/(y, + A), i 



1, 



and t^ 

■■■,m, V = YT=i'^il^- 

From the three measures of uncertainty given 
by (3.12), (3.13) and (3.15) we see a close corre- 
spondence in the respective terms in the expansion 
of the MSE of the EB estimator, the posterior vari- 



ance of 6i and Morris' approximation of the poste- 
rior variance. It is clear, though, that while the pos- 
terior variance of Oi accounts for all sources of uncer- 
tainty in a straightforward way, the EB or EBLUP 
method needs careful evaluation of all terms in the 
MSE expression and construct a second-order unbi- 
ased estimator of this quantity. Morris (1983b) pro- 
vided a clever approximation to the posterior vari- 
ance. The estimator of the MSE of the EBLUP dis- 
plays poor performance when A is estimated by zero 
or severely underestimated (this happens if the true 
variance parameter A is small) . In such case the first 
term gipR is too small compared to the first term in 
the posterior variance. This results from the integra- 
tion of A with respect to its long tail posterior distri- 
bution. Use of posterior variance has been found to 
be attractive in small area application. As an exam- 
ple, the U.S. Bureau of the Census uses this method 
in producing small area income and poverty esti- 
mates based on American Community Survey data. 
The corresponding term in Morris' approximation is 
a clever approximation to the posterior expectation. 
Although not as small as gipR, this also tends to be 
small. The gii{A) function evaluated at the point 
estimator of A, via posterior mode or REML, is usu- 
ally smaller than its integrated value with respect to 
the posterior of A. The second and the third terms 
in these measures of uncertainty, being of lower or- 
der of magnitude, usually show a greater degree of 
agreement. Another attractive feature of posterior 
variance is that it depends on the individual small 
area observation yi [through the last term in (3.13)]. 
This is not true for the second-order unbiased esti- 
mator of the MSE given in (3.12). However, the esti- 
mate of conditional frequentist mean squared error 
of prediction obtained by conditioning on y^ depends 
on the individual small area observation (see, e.g.. 
Booth and Hobert, 1998, or Datta et al., 2011). For 
related discussions comparing the Bayesian and the 
frequentist measures of uncertainty in small area es- 
timation we refer to Singh, Stukel and Pfeffermann 
(1998) and Datta, Rao and Smith (2005). Morris' 
approximation, which closely mimics the posterior 
variance, also enjoys this feature. 

We consider an illustration of the Fay-Herriot mo- 
del. The U.S. Department of Health and Human Ser- 
vices (HHS) needs estimates of four-person family 
state median income to implement an energy assis- 
tance program to low-income families. The Bureau 
of the Census (BOC) has provided such estimates 
for nearly thirty years. The BOC now uses the Fay- 
Herriot model to provide more sophisticated esti- 
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Table 1 
Data for estimating 1,979 four-person family median income for the 15 southeastern U.S. states, and different small area 

estimates 



State 



V 



^,HB 



^,EB 



^,M 















DE 


21,860 


23,103 


MD 


26,235 


27,607 


VA 


24,160 


25,514 


WV 


18,274 


21,807 


NC 


20,296 


21,408 


sc 


19,282 


21,706 


GA 


22,687 


22,599 


FL 


19,675 


23,944 


AL 


17,978 


22,233 


KY 


18,657 


21,359 


TN 


19,776 


21,240 


MS 


19,167 


19,887 


AR 


18,917 


20,214 


LA 


18,965 


22,861 


OK 


19,295 


23,668 



1,900^ 
1,722^ 
1,418^ 
1,380^ 
1,012^ 
1,795^ 
1,196^ 
1,042^ 
1,282^ 
1,285^ 
1,274^ 
1,762^ 
1,507^ 
1,444^ 
1,675^ 



21,185 


21,787 


21,031 


21,088 


21,802 


21,025 


25,399 


26,145 


25,221 


25,227 


26,134 


25,090 


23,418 


24,080 


23,264 


23,403 


24,040 


23,262 


19,133 


18,367 


19,330 


19,027 


18,397 


19,160 


19,634 


20,223 


19,472 


19,849 


20,133 


19,712 


19,448 


19,299 


19,472 


19,452 


19,296 


19,454 


21,217 


22,524 


20,842 


21,510 


22,402 


21,199 


20,884 


19,807 


21,174 


20,480 


19,941 


20,700 


19,273 


18,119 


19,575 


19,047 


18,187 


19,264 


19,008 


18,695 


19,087 


18,954 


18,716 


19,017 


19,351 


19,729 


19,239 


19,430 


19,707 


19,350 


18,360 


19,075 


18,131 


18,371 


19,097 


18,274 


18,388 


18,858 


18,250 


18,452 


18,859 


18,383 


19,996 


19,078 


20,240 


19,878 


19,096 


20,020 


20,578 


19,436 


20,894 


20,535 


19,418 


20,673 



mates. In this model the direct estimate of the fom'- 
person family state median income, to be denoted 
by 2/j, is obtained from the Current Population Sur- 
vey (CPS). Auxiliary variables for the multiple re- 
gression model are obtained from the per capita in- 
come information of the Bureau of the Economic 
Analysis (BEA) survey and the latest census data 
for the four-person family median income. In our 
illustration, we will consider only a subset of the 
U.S. states and use only one covariate. We consider 
15 U.S. states belonging to the southeast U.S. geo- 
graphical region. While there are 17 states in this re- 
gion, we excluded Texas and Washington, DC, from 
our analysis as these two small areas have their sam- 
pling variances (T^'s) very much different from the 
remaining 15 states. In the notation of this section, 
we have m = 15, p = 2, x^ = {l,Xi), with Xj, the 
adjusted census median income, given by 

_ BEA PCI(c) for state i 
^' ~ BEA PCI(6) for state i 

■ Census median (b) for state i, 

where c stands for current year (in our application 
1979) and b stands for base year (1969), BEA PCI(6) 
and BEA PCI(c) are obtained from the BEA data 
for these two years, and Census median(6) is ob- 
tained from the 1969 census. 

We present the relevant data in Table 1 below. 
Also included in the table are the EB estimates 
0bEB jj^ ^YiQ balanced case, and 6^^^ in the unbal- 
anced case), the HB estimates (9^^^ in the balanced 



case, and 0^^^ in the unbalanced case) and Morris' 
approximation to the HB estimates {6^^^ in the bal- 
anced case, and ^"^^ in the unbalanced case). As 
noted before, the sampling variances are different 
for the states and the resulting Fay~Herriot model 
is an unbalanced model. To compare the frequentist 
and the Bayesian approaches for both the balanced 
and the unbalanced setup, we have illustrated the 
balanced Fay-Herriot model given by (2.1) by re- 
placing each Vi by their average 2,162,469. Prom the 
last six columns of Table 1, we note that the point 
estimates of the small area means do not differ sub- 
stantially either over EBLUP, HB or Morris' esti- 
mates, or if the setup is a balanced or an unbalanced 
Fay-Herriot model. It is usually our experience that 
the model-based small area point estimates are sub- 
stantially robust over varying sampling variances or 
over the method of estimation, Bayes or frequen- 
tist. 

In Table 2 we include various components of the 
uncertainty measures for the Prasad-Rao estimated 
MSE, the posterior variance of the HB estimates and 
Morris' approximation to the HB moments. From 
these components we can get the relevant overall 
uncertainty measure for the EBLUPs, the HB esti- 
mates and Morris' approximation of EB estimates. 
We note that in the balanced case the relative re- 
duction in the Prasad-Rao estimated MSE over the 
sampling variance (the measure of uncertainty for 
the direct estimates) ranges between 21 and 62 per- 
cent. These numbers clearly show substantial gain in 
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Table 2 
Decomposition of various measures of uncertainty of the model-based small area estimates 



State 


Setup 


9lHB 


92HB 


QsHB 


9ijvf 


92M 


QsM 


9lPR 


92PR 


dsPR 


DE 


Balanced 


792,210 


97,376 


44,113 


459,930 


120,989 


124,922 


418,657 


142,987 


268,279 




Unbalanced 


1,129,602 


121,110 


39,030 


937,376 


116,803 


75,548 


377,191 


139,692 


194,412 


MD 


Balanced 


792,210 


696,375 


66,040 


459,930 


865,242 


187,017 


418,657 


1,022,559 


268,279 




Unbalanced 


1,039,678 


858,915 


103,302 


828,078 


915,537 


168,214 


409,823 


1,157,267 


229,034 


VA 


Balanced 


792,210 


295,960 


51,495 


459,930 


367,728 


145,828 


418,657 


434,588 


268,279 




Unbalanced 


863,313 


277,090 


77,930 


656,YYY 


332,909 


142,517 


485,759 


453,886 


310,056 


WV 


Balanced 


792,210 


106,701 


71,651 


459,930 


132,575 


202,908 


418,657 


156,680 


268,279 




Unbalanced 


839,191 


84,658 


67,832 


636,503 


100,218 


144,901 


497,472 


135,006 


322,621 


NC 


Balanced 


792,210 


125,904 


43,600 


459,930 


156,435 


123,472 


418,657 


184,878 


268,279 




Unbalanced 


583,606 


60,730 


43,268 


447,884 


85,908 


98,855 


639,733 


139,644 


477,672 


sc 


Balanced 


792,210 


110,835 


2,312 


459,930 


137,711 


6,548 


418,657 


162,750 


268,279 




Unbalanced 


1,077,706 


126,061 


1,807 


872,021 


125,390 


3,518 


395,558 


152,055 


213,886 


GA 


Balanced 


792,210 


91,351 


218,486 


459,930 


113,503 


618,727 


418,657 


134,140 


268,279 




Unbalanced 


716,079 


57,972 


235,098 


540,917 


74,935 


508,347 


562,280 


108,923 


392,592 


PL 


Balanced 


792,210 


134,749 


144,408 


459,930 


167,426 


408,947 


418,657 


197,866 


268,279 




Unbalanced 


605,744 


73,260 


125,413 


462,990 


103,414 


293,078 


626,452 


166,675 


462,923 


AL 


Balanced 


792,210 


94,683 


163,838 


459,930 


117,643 


463,970 


418,657 


139,033 


268,279 




Unbalanced 


774,894 


66,728 


156,347 


585,124 


82,616 


342,043 


530,290 


115,370 


357,952 


KY 


Balanced 


792,210 


128,846 


11,885 


459,930 


160,090 


33,657 


418,657 


189,198 


268,279 




Unbalanced 


776,906 


93,056 


13,089 


586,680 


115,699 


26,677 


529,228 


162,238 


356,806 


TN 


Balanced 


792,210 


136,323 


18,482 


459,930 


169,381 


52,340 


418,657 


200,177 


268,279 




Unbalanced 


769,515 


97,600 


16,381 


580,981 


122,136 


37,950 


533,140 


172,245 


361,029 


MS 


Balanced 


792,210 


269,811 


68,948 


459,930 


335,238 


195,254 


418,657 


396,191 


268,279 




Unbalanced 


1,060,717 


318,630 


40,995 


852,009 


327,641 


98,442 


401,844 


405,527 


220,560 


AR 


Balanced 


792,210 


239,946 


28,598 


459,930 


298,131 


80,987 


418,657 


352,337 


268,279 




Unbalanced 


918,002 


229,443 


18,346 


705,154 


260,944 


45,703 


460,413 


342,328 


282,932 


LA 


Balanced 


792,210 


92,944 


104,354 


459,930 


115,483 


295,519 


418,657 


136,480 


268,279 




Unbalanced 


879,552 


79,807 


86,164 


670,775 


91,992 


191,249 


478,058 


121,522 


301,805 


OK 


Balanced 


792,210 


118,713 


164,196 


459,930 


147,500 


464,983 


418,657 


174,318 


268,279 




Unbalanced 


1,014,326 


130,894 


113,081 


800,405 


138,730 


255,182 


419,730 


174,242 


239,566 



the accuracy of the model-based estimates. In the 
unbalanced setup, these relative reductions range 
between —23 and 80 percent; only two states, NC 
and FL, have negative improvement, which is some- 
what surprising. However, these two states being 
direct-use states in the CPS, perhaps they enjoy 
large sample size to produce relatively accurate di- 
rect estimates. Also, for these states, the 53 term is 
relatively big resulting in a large estimated MSE of 
the EBLUP. The corresponding improvement num- 
bers for the HB estimates are between 26 and 64 
percent in the unbalanced case, and 28 and 58 per- 
cent in the balanced case. For Morris' approxima- 
tion, these numbers are between 21 and 69 percent 
in the unbalanced case, and 30 and 72 percent in 
the balanced case. 



In Table 2 we present the decomposition of the 
uncertainty corresponding to the three sources: un- 
certainty due to estimation of unknown small area 
mean, uncertainty due to estimation of the regres- 
sion coefficients and uncertainty due to unknown 
variance components. We consider the mean squared 
error of an EBLUP (or EB predictor), the poste- 
rior variance and its approximation due to Morris 
(1983b) for both the balanced and an unbalanced 
setup. From this table we find that for each method 
of estimation and each setup, all the three com- 
ponents of uncertainty contribute substantially to- 
ward the overall measure of uncertainty for most 
small areas. Thus it is important to account for the 
uncertainty in estimating the regression coefficients 
and the variance components in deriving a reliable 
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overall measure of uncertainty associated with the 
model-based small area estimates. 

In this example in the balanced Fay-Herriot setup 
the estimate of A obtained by Prasad-Rao or Mor- 
ris' method is 16,1617, which is substantially smaller 
than the HB estimate given by 1,735,616. The lat- 
ter estimate is more than ten times the frequentist 
estimate and it results from a very long tail of the 
posterior distribution of A. This larger HB estimate 
of A results in a substantially bigger value of the 
first component (the gi term) of the Bayesian mea- 
sure than the corresponding component in the other 
measures. In fact the frequentist estimate of A is so 
small that, contrary to our expectation, for some ar- 
eas the estimate of the gi term is not the dominant 
term in the estimated mean squared error (see the 
columns for Morris' approximation and Prasad-Rao 
estimates) . 

We notice that the picture does not change sub- 
stantially when we consider the unbalanced setup. 
Here again, the posterior density of A has a long 
tail resulting in a posterior mean of 2,063,419. The 
Prasad-Rao estimate is again far too small, only 
192,527, and Morris's estimate is in between, which 
is 515,969, much larger than the Prasad-Rao esti- 
mate but much smaller than the HB estimate. 

We reiterate that all three components contribute 
substantially toward the overall measure of uncer- 
tainty. In particular, from the seventh and the eighth 
columns of Table 2, we note that the third term 
(the 33 term) is bigger than the second term (the g2 
term) in 14 of the 30 rows. From the last two columns 
of the table, we note that the third term is bigger 
than the second term in 22 of the 30 rows. All these 
indicate that ignoring this component in the fre- 
quentist estimate of MSE or Morris' estimate will re- 
sult in a severe underestimation. It is particularly so 
for the Prasad-Rao frequentist MSE since the first 
term {gi term) is also adjusted for bias by adding 
the 53 term. Incidentally, the HB measure of un- 
certainty automatically accounts for all sources of 
uncertainty. 

We conclude this section noting that here and in 
the previous section we assumed the sampling vari- 
ances Vi^s are known. This assumption was neces- 
sary to avoid the identifiability problem. If addi- 
tional independent estimates (independent of Yi's) 
of Vj's are available, and V^'s depend on a finite 
number of parameters, then the previous results can 
be extended to develop model-based small area es- 
timates of the means and their measures of uncer- 
tainty. It can be done for both the EBLUP and HB 



approaches. This is essentially similar to the unit- 
level model considered briefly in Section 6. However, 
ifVi^s cannot be assumed to depend on a finite num- 
ber of parameters, then the mean squared approx- 
imation results presented here do not hold. In this 
scenario Wang and Fuller (2003) assumed that in- 
dependent Vi,i = l,...,m, are available which are 
unbiased for Vi. Assuming independent chi-squared 
distributions of these estimates, they derived MSE 
approximation of the EBLUP of 6i. Their approx- 
imation is valid provided both m and d, the min- 
imum of the degrees of freedom of the chi-squared 
distribution, are large. Their approximation to the 
MSE is accurate only to the order of d~^''^. We refer 
to this article for details. Another related paper in 
this setup is by Rivest and Vandal (2004). 

4. EXTENSIONS 

The Fay-Herriot (1979) model discussed in the 
previous section can be extended in different direc- 
tions. First, instead of y\6 ~ N{9,G), where G = 
Diag(Vi, . . . ,Vm), one can begin with y|0 ~ N{6,\'), 
where V is a known positive definite matrix which 
is not necessarily diagonal. The full model is thus 

(4.1) y|0~Af(0,V), 0r^N{'Xp,AIm). 

Datta et al. (1992) considered this model in the con- 
text of adjustment of census undercounts. It is easy 
to check for A{> 0) known, the BLUP (or the HB 
predictor with a flat prior for (3) is given by 

(4.2) 0^ = (I„-B)y + BX;3(A), 

where B = (V + AIrr,y^V and P{A) = [X^(V + 
AI^)-iX]-iX^(V + ^I„)-V- With A unknown, 
one can opt either for estimation of A from the 
marginal distribution of y, namely, A^(X/3, V + AIm) 
or put a flat prior for A, that is, 7r((3,A) = 1. Datta 
et al. (1992) tried both methods in the context of 
adjustment of census counts based on 1988 Missouri 
Dress Rehearsal data, but found very little differ- 
ence in the estimation of 0. 

The work of Datta et al. (1992) is based on model- 
ing the adjustment factors related to census counts. 
To be specific, let Tj denote the true count and 
the Ci the census count for the ith. small area. Then 
Cressie (1989) and Isaki, Huang and Tsay (1991) 
proposed modeling 9i = Ti/Ci (i = 1, . . . , m). 

Direct estimates of these adjustment factors are 
usually obtained from a postenumeration survey 
(PES) conducted by the Bureau of the Census. In 
1990, the Bureau of the Census produced PES es- 
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tiinates of the adjustment factors for 1,392 subdivi- 
sions (poststrata) of the total population. The PES 
sample contained approximately 377,000 persons in 
roughly 5,200 census blocks. However, prior to the 
1990 census, the Census Bureau had a trial run for 
several test sites in Missouri to obtain direct esti- 
mates of these adjustment factors based on (pur- 
ported) complete enumeration and PES. Datta et 
al. (1992) conducted an evaluation of this so-called 
Census Dress Rehearsal Data using the method de- 
scribed earlier in this section. 

The HB and the EB estimators of 6 based on (4.2) 
are given respectively by 

(4.3) e^^ = [Im - i?(B|y)]y + ii;[BX^(A)|y], 

(4.4) 0^^ = [I„-B]y + BX;3(i). 

The posterior variance V{6\y), as before, is given by 
V{e\y)=Y{l^-E{B\y)] 

+ ^[BX{X^(V + AI„)-iX}-iX^B^|y] 

+ Var[B{y-X^(A)}|y]. 

This was found numerically very similar to the plug- 
in estimate of the second-order approximate MSE 
given by 

« V(I^ - B) + BX{X^(V + ^i^)-i}x'^B^ 
+ 2VK3V[tr(V~2)]-i^ 

where 

K = (V + AI„)-i 

- (V + ^i^)-ix{x^(v + Aimr^y.}-^ 
• x^(v + Ai„)-i. 

The study of Datta et al. (1992) revealed that for 
every poststratum, the EB (or EBLUP) and HB es- 
timators of the adjustment factors outperformed the 
direct estimators. 

There is also a multivariate extension of the Fay- 
Herriot (1979) model considered in Datta, Fay and 
Ghosh (1991). Now the data consist of yi,y2, . . . lYm-, 
where each yj is r-dimensional. Bivariate and trivari- 
ate versions of the model were used in Datta, Fay 
and Ghosh (1991), and later in Datta et al. (1996) to 
estimate median income of four-person families for 
the 50 states and the District of Columbia. They 
considered the random effects model 

(4.5) yj = Xj/3 + Uj + ej, i = l,...,m, 

where Uj '~ ' A^(0, A) and ej ~ A^(0,Vj), the Uj 
and the Cj being mutually independent, and the 



Vi{> 0) are known. Alternatively, in a Bayesian fra- 
mework, writing Oi = Xj/3 + Uj (i = 1, . . . ,m), yj| 

Oi '~ A(0i, Vi) and Oi '~ A(Xi/3, A). Both EB (or 
EBLUP) and HB estimators of the Oi were found. 
These estimators were shown to outperform the di- 
rect estimators with respect to their precision mea- 
sures. 

5. CONFIDENCE INTERVALS IN SMALL 
AREA ESTIMATION 

Morris (1983b) noted that although Stein's shrink- 
age estimators were widely used for point estima- 
tion, a lack of the availability of estimated uncer- 
tainty with these estimators delayed development 
of reliable confidence intervals. An early attempt 
to construct EB confidence intervals is due to Cox 
(1975). In the small area estimation terminology, 
he developed approximate confidence intervals that 
are accurate to the order of 0{m~^) for an individ- 
ual small area mean Oi for the balanced Fay-Herriot 
model without any covariate. Again in the small area 
estimation terminology, Morris (1983a, 1983b) was 
the first to consider confidence intervals for small 
area means for the Fay-Herriot model with covari- 
ates. He considered both the balanced and the un- 
balanced sampling variance cases. His method con- 
sists essentially in finding an HB confidence inter- 
val for 9i^ approximating (using Laplace approxi- 
mations to integrals) this interval with estimates 
of the hyperparameters only at the last stage. He 
constructed these intervals using normal percentile 
points and provided a heuristic justification of these 
naive EB intervals. Later Laird and Louis (1987) 
proposed EB bootstrap confidence intervals in the 
spirit of Morris (1983a, 1983b), while Carlin and 
Gelfand (1990), following a suggestion of Efron, pro- 
posed calibrating the naive EB confidence intervals. 
Indeed, in small area estimation setup, both for unit- 
level and area-level data, Prasad and Rao (1990) 
also suggested approximate confidence intervals for 
small area means. They based their intervals on nor- 
mal percentile points and used their second-order 
unbiased estimator of the MSE of the EBLUP. As 
in Morris (1983a, 1983b), Prasad-Rao intervals also 
have a coverage error to the order of 0{'m~^). 

For the case when Vi = . . . , Vm = V and A are 
both known, a flat prior for /3 will result in a 100(1 — 
a)% confidence interval of the form (1 — B)yi + 
i?xf;9± 2:^/2^"^''^ (1 — B + Bm~^y/^ , where we may 
recall that JS = (X-^X)^-^X'^y, the least squares es- 
timator of (3. This result follows immediately from 
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Lindley and Smith (1972). A naive EB confidence in- 
terval is given by (1 - B)yi + Byiffi ± Zq/2V^^^^(1 - 
i?)^", which does not take into account uncertainty 
due to estimation of (3. Accordingly, the coverage pro- 
bability will fall short of the target under the said hi- 
erarchical model. The Type III bootstrap approach 
of Laird and Louis (1987) provides a confidence in- 
terval identical to the hierarchical Bayesian approach, 
where the bootstrap samples y are drawn from the 
A^(X/3, {V + A)\rn) pdf. The same confidence inter- 
val is also arrived at by the conditional approach of 
Hill (1990). Hill's approach consists of finding the 
conditional distribution oi Oi — {{1 — B)yi -\- B:k[(3) 
given the ancillary statistic Ui = yi — x^^S. Also, it 
is pointed out by Laird and Louis (Theorem 2.1, 
page 743) that the Type HI bootstrap can never 
match a hyperprior solution when A is unknown. 

For the balanced Fay-Herriot model, Datta et al. 
(2002) developed an expansion for the coverage pro- 
bability of confidence intervals derived by Morris 
(1983a, 1983b) and Prasad and Rao (1990). Based 
on this expansion they perturbed the endpoints of 
the confidence interval to achieve asymptotic cov- 
erage accurate to the order of o{m~^). Also, fol- 
lowing the framework of Hill (1990), Datta et al. 
(2002) studied conditional coverage probabilities of 
such intervals even for unknown A by conditioning 
on a suitable ancillary statistic. They obtained an 
expansion of the conditional coverage probability as 
well and used the expansion to better calibrate the 
interval. For some d> p let B{S) = Bd{S) = {m — 
d) imu.{V / S , {m — p)~^}- Assuming maxi<j<m/iM = 
0{m~^), where ha is defined in Theorem 3, they 
had for any fixed t > the following expansion. 



Theorem 4. 



P[ei G (1 - B{S))Yi + B{S)xi (3 ± tVil - B{S)r'] 



1/2] 



2$(t) - 1 



- tm 
+ 0{m 



+ 



B 



- (1 + t^)^^ 
_2m(l-B)2 ' l-B 

-3/2N 



hii + 



5-d 



m 



Let Za/2 denote the upper a/2 point of A^(0, 1) 
distribution. Taking t = Za/2 will result in an under- 
estimation in the nominal coverage 1 — a. If we take 



2q/2 



, , (^ + ^l/2)B' ^ (5- 



d + mhii)B 



4m(l-B)2 2m{l-B) 

it follows that the interval (1 - B{S))Yi+B{S)xJ ^± 
t*V{l — B{S))^''^ has coverage probability equal to 



1 — a up to 0(7Ti~'^'2) error terms. Although this 
theorem is presented in the context of EB inter- 
vals, Datta et al. (2002) also discussed expansion of 
coverage probabilities of intervals that are created 
through the HB argument of Morris (1983a). 

Extending the argument of Hill (1990), Datta et 
al. (2002) also obtained an expansion of the cover- 
age probability of an EB confidence interval of 9i 
by conditioning on an ancillary statistic U = {Yi — 
yi^0)y\m — p)/\/S. They proved the following the- 
orem. 

Theorem 5. 

P[9i E (1 - B{S))Yi + B(5)xf ^ 

±tV{l-B{S))^^^\U] 
= 2$(t)-l 



tm 



+ Op{m 



(1 + ^2)^2 (2C/2 + 3-d)S 



2m(l-S)2 m{l-B) 

Bhn 



+ 



l-B 



-3/2N 



The bias corrected confidence intervals for 6i are 
obtained as before with appropriate changes. 

Datta et al. (2002) performed a simulation study 
to evaluate the performance of the approximate con- 
fidence intervals given in the two theorems above. 
In these simulations they used a simple setup with 
TO, = 30 small areas with no covariates. Since the 
coverage probability does not depend on /3, it was 
taken as zero in generating the samples. Also, the 
coverage probability depends only on i3, so without 
any loss of generality V was taken to be 1. These 
authors considered various values of B in the range 
0.025 to 0.975. They computed both conditional and 
unconditional coverage probabilities as discussed in 
the theorems given above. They found little qualita- 
tive difference in performance between the uncondi- 
tional and conditional coverage probabilities. They 
also noted that while the extent of underestimation 
of the coverage probabilities with t = z^/2 from the 
nominal level a was small for small B, the underes- 
timation was severe for B in the upper half. On the 
other hand, the adjusted intervals appeared to be 
too large resulting in overestimation of the coverage 
probabilities. This overestimation is due to an over- 
estimation of the mean squared error of the EB esti- 
mator of 9i. Incidentally, Lahiri and Rao (1995) also 
noted similar overestimation of the MSE when B 
approaches 1, that is, when A/V approaches 0. 
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Smith (2001) in his unpubhshed Ph.D. disserta- 
tion developed EB confidence intervals for the iih 
small area mean 6i for the more practical case of un- 
balanced Fay-Herriot model in (3.1). Associated with 
the EB or EBLUP 9f^ of Oi, let s'f denote some esti- 
mated measure of uncertainty. Note that sf could be 
a second-order unbiased estimator of the MSE of Of 
as in (3.12) or something similar. For some estima- 
tor A of A, Smith (2001) defined sf = hf{A) + Ci, 
where hf{A) = gu^A) + g2i{A). The term Cj is an 
Op{m~^) order term, that may depend on A and 
the data Y, and may be related to g^i term in (3.10) 
and bias term of A. There are many possible choices 
corresponding to various MSE estimates. Rao (2001) 
proposed a number of area-specific estimators of the 
MSE of the EBLUP, and they can be included by 
proper choice of Cj. Alternatively, in the HB setup, 
Cj may include {yi — x?'^)^ Var(i?j(^)|y), which is 
an approximation to the last term in the posterior 
variance in (3.13). This general choice enabled Smith 
to study approximate coverage probabilities of confi- 
dence intervals constructed in Morris (1983a, 1983b) 
by using EB and HB methods. Corresponding to Cj, 
let the parametric function c*{A) be such that q — 
c*{A) = Op{m-^). Also, define qi{A) = Bf{A)b^{A) + 
c*{A) — 2g3i{A), where b^{A) is the asymptotic bias 

of A. With the above notation we now state Theo- 
rem 1.7.1 of Smith (2001) below. 

Theorem 6. For any z>0, 



pW - zsi <e,< Of'' + zsi] 
= 2^{z)-l + z(i){z) 



^{A) 



{z^ + z)^{z)g:,i{A) Df 



+ 0(171 ^). 



AhfiA) (A + D, 

Note that the leading term in the above expansion 
is the nominal coverage probability. The first-order 
error term in this expansion is of order 0{m~^). 
From this expansion it follows that as in Theorem 4 
we can perturb the cut-off point z in order to achieve 
the nominal coverage probability to the order o{m~^). 
Another point to note is that since the 0{m'~^) 
term Cj (or equivalently, c*) was not completely spec- 
ified, for any given z we can choose c*{A) (depend- 
ing on z and A) to make the 0{m~^) term in the 
expansion of the coverage probability disappear. In 
particular, the choice q = c*(A) with 

[z^ + 1) A 



c*{A) = 2g3,{A)-B^{Afb^{A) + 



4A 



-9m{A) 



will give an EB confidence interval that matches the 
nominal coverage probability to the order of o{m~^). 
In this section we have considered confidence in- 
tervals for individual small area means, which is the 
current state of the literature. In the early appli- 
cations of small area estimation, practitioners were 
only interested in point estimates (see, e.g.. Fay and 
Herriot, 1979). Only in the last twenty years or so, 
substantial development of the measures of uncer- 
tainty of the model-based estimates of small area 
means has taken place. Construction of appropri- 
ate confidence intervals for small area means is still 
limited and is restricted only to individual means. 
While in the EB setup confidence sets for several 
population means have been considered, this prob- 
lem is not fully addressed yet in small area estima- 
tion. In a recent article, Ganesh (2009) has consid- 
ered simultaneous credible intervals in small area 
estimation. However, calibrated confidence sets for 
multiple small area means in EB or EBLUP ap- 
proach have not been studied yet. 

6. OTHER IMPORTANT DEVELOPMENTS 
IN SMALL AREA ESTIMATION 

We mentioned in the Introduction that both area- 
level and unit-level data are available in small area 
estimation. In the previous sections we have concen- 
trated mostly on area-level models. In this section 
we review some of the results for unit-level mod- 
els. For a unit-level model let yij denote the value 
for the jth. unit in the iih small area, with j = 
1, . . . ,Ni, i = l, . . . ,m, where Ni is the size of the fi- 
nite population corresponding to the ith small area. 
Let 7j = N~ "^jliyij denote the finite population 
mean for the iih. small area. For notational simplic- 
ity let yij,j = l,...,ni,i = l,...,m denote values of 
the characteristic of the sampled units from these m 
small areas. Let the vector y(s) denote all the sam- 
pled values. A direct estimator of 7^ based on the ith 
area sample mean Yis is usually less reliable due to 
a small sample size n^. To borrow strength from the 
neighboring areas through shrinkage estimation the 
following model, known as the nested-error regres- 
sion model, has been found very useful for unit-level 
data. The model is given by 

^ij — '^ijP ~r Vi -\- Cij , 

(6.1) 

j = l,...,Ni,i = l,...,m, 

where Xjj is a p-component vector of auxiliary vari- 
ables, Vi and eij are independently distributed with 
Vi "~ ■ N{0, a^) and e^j -'^ A^(0, a^), j = 1,. . . ,Ni,i = 
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1, . . . ,m. We denote the observations for the samp- the residuals from the ordinary least squares regres- 
led units in the ith smah area by Y^ ^ = {Yn, . . . , sion of Yij - Yis on {xj^ - XjJ and Uij are the resid- 



Yim)^ ■ Similarly, Y^^ is used to denote the vec- 
tor of observations corresponding to the unsampled 
units in the ith small area. Battese, Harter and 
Fuller (1988) and Prasad and Rao (1990) used this 
model to develop EBLUP estimate of finite popu- 
lation mean 7j. They approximated ji for large Ni 
by 9i = 'K.JfB + Vi and used the predictor of Oi to 



n: 



T^^ X- 



is the known 



estimate 7^. Here Xj — ^,- ^j^i^ij 
mean vector of the auxiliary variables. 

Let Y'^) be obtained by stacking the vectors Y^^ 
for all the m small areas. Similarly, denote by X^^^ 
the matrix of p columns obtained by stacking 
the Xjj's corresponding to the sampled units. We 
also denote the variance of Y*^^^ by 5)ii. From Prasad 
and Rao (1990) the BLUP of 9i is obtained as 

(6.2) 0~.(^, y(i)) = Xf^ + 6,{Yis - xp)> 
where -0 = (cj^,o"g), and 

(6.3) p = (x(i)^i]r/x(i))-ix(i)^sri'Y(^) 

is the generalized least squares estimator of (3. Here 
Si = (j1{(t1 + Cgfi" )~^ is the shrinkage coefficient 
which shrinks the direct estimator Yis of 7j (or Oi) 
toward a regression surface. 

Under the superpopulation model given by (6.1), 
from Prasad and Rao (1990) and Datta and Ghosh 
(1991b) one can show that the BLUP of the finite 
population mean 7^ under the nested error regres- 
sion model is given by 

(6.4) 7.(^, yW) = m^ + (1 - /.)%)(^, Yd)), 

where _/i = nj/iVi, 6ii(„)('0, Y^^)) is given by (6.2), 
with Xj replaced by Xj(„), the mean of Xjj's for 
the Ni — rii unsampled units from the ith area. The 
BLUP of the small area mean 7^ usually depends 
on variance components, which in practice will be 
unknown. Estimates of variance components ■0 are 
plugged in to the BLUP to obtain EBLUP esti- 
mates. The variance components are estimated from 
the marginal distribution (by integrating out fj's) of 
the data, Y^^). 

While Datta and Lahiri (2000) suggested ML and 
REML estimation of the variance components, Pra- 
sad and Rao (1990) used ANOVA methods to ob- 
tain unbiased estimators for variance components 
in the nested error regression model. Prasad and 
Rao (1990) first obtained iij, Uij, j = 1, . . . ,nj, i = 
1, ... ,771, where {eij,j = 1,. . . ,ni,i = 1, . . . ,777} are 



uals from the ordinary least squares regression of Yij 



on Xjj. Estimators 



(6.5) 



<7, 



cr.: 



(77 ■ 



n^ 



m — p 






and 



^^ul-{n-p)&, 



are unbiased, where n:^ = n — ti[(X.^^' ^^^')~^ ■ 
YlT^i ''^l^is^Isl ' ^^'^ P* is equal to the number of lin- 
early independent vectors in the set {xjj — Xj^, j = 

l,...,77i,7 = l,...,777}. 

Second-order accurate approximations to MSE of 
the EBLUP of Oi were developed by Prasad and Rao 
(1990) and Datta and Lahiri (2000). These authors 
showed for the nested error regression model the 
three terms in the approximation [cf. (3.10)] are 



(6.6) g2i{il^) 



(l-<5^)^.^ 

(X,-5,x,,)^(X«^Sr/(^)xW)-i 
• (Xj -JjXjs), 



(6.7) 



53i(^)=?^i '^{(jl+al/ni)- 



2-2 



var(cj^6-, 



2 -2^ 



For an estimator tp of ip, from Prasad and Rao 
(1990) and Datta and Lahiri (2000) a second-order 
unbiased estimator of the MSE of the EBLUP of 9i 
is given by 



(6.8) 



mseiOii'ip)) = giii^p) + g2ii'ip) + 2gsi{'ip) 



b^(^;^)Vgi,(^), 



where h'^{'ip;ij^) is the asymptotic bias of ip, and 
V5ij(i/') is the gradient vector oi gui^jj). For estima- 
tors of variance components with asymptotic bias 
of o{m~^), the last term in (6.8) drops out. This 
happens for the ANOVA estimators suggested by 
Prasad and Rao (1990) and the REML estimators 
considered by Datta and Lahiri (2000). 

Estimation of the MSE of EBLUP outlined above 
and in Section 3 is based on Taylor's expansion. 
Alternatively, a resampling-based approach may be 
used to estimate the MSE. Laird and Louis (1987) 
suggested a bootstrap measure of accuracy of the EB 
estimator for the Fay-Herriot model. Subsequently, 
Butar and Lahiri (2003) adopted their approach in 
small area estimation. Further references to this lite- 
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rature may be found in Pfefferinann and Tiller 
(2005), Lahiri (2003) and Hall and Maiti (2006). 
Jiang, Lahiri and Wan (2002) proposed jackknife 
methods to estimate the MSE of the EBLUP. 

Datta and Ghosh (1991b) proposed a general HB 
model for unit-level data in small area estimation. 
Some earlier Bayesian analysis for two-stage sam- 
pling in a simpler framework is due to Scott and 
Smith (1969), with subsequent extension to the mul- 
tistage sampling by Malec and Sedransk (1985). Ba- 
sed on the superpopulation approach to finite popu- 
lation sampling Datta and Ghosh (1991b) developed 
HB estimates of small area means by deriving cer- 
tain predictive distributions. To that objective, they 
considered the following HB model: 

(A) Conditional on 13, A = (Ai, . . . , Af) and r, let 

Y ~ iV(X/3, r~i(* + ZD(A)Z'^)), 

where Y is A^ x 1 vector of characteristics of all 
the N units in the finite population, X and Z 
are N x p and N x q matrices, respectively, for 
appropriate known p and q. 

(B) /3,r and A have a certain joint prior distribu- 
tion. 

Stage (A) of the above model can be identified as 
a general mixed linear model (cf. Datta and Ghosh, 
1991b). To see this, write 

(6.9) Y = X/3 + Zv + e, 

where e and v are mutually independent with e ~ 
A^(0,r-i*),andv~A(0,r-iD(A)).Hereeis Axl, 
and V is g X 1 vector of random effects, 'I' is a known 
positive definite matrix and D(A) is a qx q p.d. ma- 
trix which is known except for A. 

In the context of finite population Y is parti- 
tioned as Y^ = (Y(i)^,Y(2)^), where Y^^) corre- 
sponds to the sampled units and Y^^^ corresponds 
to the unsampled units. Similarly, the design ma- 
trices X and Z are partitioned. To make inference 
about certain functions of Y, the Bayesian solution 
is obtained by deriving the predictive distribution 
of Y^^-* given Y^^^ = y'^-* (which is the posterior 
distribution of Y"^). In small area estimation the 
vector of sampled units Y'^^ is from m small areas. 
If Yj- is the {ui x 1) vector of sampled units from 
the ith. small area, then Y^^ = (Y^^^^, . . . , Y^^^). 
Similarly, the vector Y^^' corresponding to the un- 
sampled units can be partitioned. The finite popu- 
lation mean 7j from small area i is a linear function 
of Y*-^), and its predictive distribution may be de- 
rived from the distribution of Y*-^). In particular, 
based on a quadratic loss function, the HB estima- 



tor is given by the posterior mean of 7^, and a mea- 
sure of uncertainty is given by the posterior variance 
of 7j. While the solution for the general HB model 
is presented in Datta and Ghosh (1991b), we now 
spell out below some of the details for the nested 
error regression model. 

For the nested error regression model in (6.1), 
t = 1, r = (T~^ and Ai = a1/a1. To complete the HB 
model, Datta and Ghosh (1991b) assigned indepen- 
dent prior distribution on /3, ai and cr^. They put 
a uniform prior over RP for /3, and cjg ~ IG{ao/2, 
go/2) and a^ ~ /G'(ai/2,5i/2), where IG{P,a) is 
a distribution whose pdf is proportional to exp(— /?/ 
x)x~°'~^ . Quantities ao,go, gi are nonnegative and oi 
is positive, and are chosen suitably small to reflect 
diffused prior information on the variance compo- 
nents. 

The HB estimates for any reasonably complex mo- 
del do not admit any closed-form expressions, and 
they are evaluated by numerical computations. Re- 
quired posterior moments can be found either by 
Gibbs sampling (cf. Gelfand and Smith, 1990) or by 
numerical integration. Using formulas for iterated 
expectation and variance, Datta and Ghosh (1991b) 
have shown that the posterior mean and the poste- 
rior variance can be computed by evaluating several 
one-dimensional integrals with respect to the pos- 
terior density of Ai. In particular, the HB estimate 
of 7j is 

7.^^ = ii;[7.(Ai,y«)|y«], 

where the expectation £'[-|y'^^] is with respect to the 
posterior density of Ai, and 7j(Ai, y^^^) (with a slight 
abuse of notation) is the same as the expression 
of 'yi{'ip,Y^^') given in (6.4). Note that the above 
HB estimate of 7^ is obtained by shrinking the di- 
rect small area estimator Yis to an estimated regres- 
sion surface. Similarly, the posterior variance of ji 
can also be computed by numerical integration in- 
volving one-dimensional integrals. Alternatively, the 
Gibbs sampling can also be implemented very eas- 
ily for the present model. Indeed Datta and Ghosh 
(1991b) have shown that the set of complete condi- 
tional distributions are given by either multivariate 
normal or inverse gamma distributions. 

7. OTHER SMALL AREA ESTIMATORS 

7.1 Measurement Error Models 

In our presentation of the unit-level model, we 
have assumed so far that the covariates are mea- 
sured without error. However, sometimes it is not 
possible to obtain exact measurements of these co- 
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variates. For example, if in prediction of certain crop 
yield, the nitrogen level in the soil is a covariate, this 
covariate needs to be determined by analysis of soil 
sample. This will result in measurement error of the 
covariate. For the nested error regression model with 
a single covariate with measurement error Ghosh 
and Sinha (2007), Ghosh, Sinha and Kim (2006) 
and Torabi, Datta and Rao (2009) have considered 
estimation of small area means. While Ghosh and 
Sinha (2007) used a functional measurement error 
model, Ghosh, Sinha and Kim (2006) and Torabi, 
Datta and Rao (2009) considered a structural mea- 
surement error model for estimation of small area 
means. Ghosh, Sinha and Kim (2006) and Torabi, 
Datta and Rao (2009) used the model given by 



(7.1) 



yij = l3o + /3iXi + Vi + eij; 
j = l,...,ni;i = l,...,m. 



where as before yij is the response variable of the jth 
unit in the ith area (or stratum), Xi is the unknown 
true area-specific covariate associated with yij . Fur- 
ther, Vi '~'A^(0,(T^) and independent of Cij ~' 
A^(0, CTg). Under measurement errors, Xij{= Xi + Uij) 
are observed, where Uij '~ ' N{0,a'^). They also as- 
sumed that Xi '~ ' N{fix,crx)- The vector of model 
parameters is given hy 6 = {/3q, /3i, //a;, u^, cr^, cr^, Ug)-^, 
and Xi,Vi,eij and Uij are assumed to be mutually in- 
dependent. 

Based on the preceding model Ghosh, Sinha and 
Kim (2006) obtained the EB predictor of 7j by re- 
placing the model parameters by their estimates in 
the Bayes estimator of 7j based on the conditional 
distribution oiYij,j = ni + l,...,Ni, given 9 and yij , 
j = 1, . . . ,nj. Since Xij's are also stochastic Torabi, 
Datta and Rao (2009) instead first derived the fully 
efficient Bayes estimator of 7j based on the condi- 
tional distribution ofYij,j = ni + l,...,Ni, given 9, 
yij,j = 1, . . . ,ni, and XijJ = 1, . . . ,71^. Finally, they 
obtained an EB estimate of ji by replacing 9, the 
model parameters by their estimates as given in 
Ghosh, Sinha and Kim (2006). Torabi, Datta and 
Rao (2009) employed the jackknife method to ob- 
tain an estimate of mean squared prediction error 
(MSPE) of the EB predictor. For further details we 
refer to these two papers. 

7.2 Generalized Linear Models 

Until now we have considered small area estima- 
tion problems only for continuous-valued response. 
However, often in practice, response variables are 
binary or categorical. For example, in the SAIPE 



program, U.S. Census Bureau is interested in es- 
timating the poverty rates among school children. 
The response variable here is binary taking values 1 
and depending on whether the child is in poverty 
or not. More generally, the response variable may 
take values in multiple categories. Again, in the dis- 
ease mapping context, the response is typically the 
number of occurrences of a rare event. Generalized 
linear models are needed for the analysis of this kind 
of data. 

Both empirical and hierarchical Bayesian approa- 
ches have played an important role in developing 
small area estimates for discrete data. Dempster and 
Tomberlin (1980), Farrell, MacGibbon and Tomber- 
hn (1997) and MacGibbon and Tomberhn (1989) 
have obtained small area estimates of proportions 
based on EB techniques. A general EB formulation 
for simultaneous estimation of means from the nat- 
ural exponential family quadratic variance function 
family of distributions is due to Ghosh and Maiti 
(2004). They provided also estimated mean squared 
errors of the small area estimators. Earlier, for the 
binary case, Jiang (1998) and Jiang and Zhang (2001) 
obtained such mean squared error estimators based 
on the jackknife approach. On the other hand, a gen- 
eral hierarchical Bayesian approach based on gen- 
eralized linear models in the small area estimation 
context is due to Ghosh et al. (1998). 

7.3 Balanced Loss Functions 

HB and EB estimators in the small area context 
are mostly derived under squared error loss. As an 
alternative, Ghosh, Kim and Kim (2008) considered 
the balanced loss introduced and made popular by 
Zellner (1988, 1994). For simplicity, we go back to 
the framework of Section 2 where we considered 
small area models with equal number of observa- 
tions within each area. For an arbitrary estimator 
T = (Ti, . . . ,Tm)'^ of 6, the balanced loss is given 
by L{d,T) = m'^[w\\y - Tf + (1 - w)\\T - Of], 
where || • || is the Euclidean norm and w G [0, 1] is 
the known weight. The choice of w reflects the rela- 
tive weight which the experimenter wants to assign 
to goodness of fit and precision of estimation. The 
extreme cases w = and w = 1 refer solely to preci- 
sion of an estimate and goodness of fit, respectively. 

Under the balanced loss with a flat prior for (3, 
it follows from Section 2 that the Bayes estima- 
[l-{l-w)B]y+{l-w)BP^y 



e\ 



tor of is ^BAL 

with corresponding Bayes risk m~^E\\6 
V[{1 — B) + bvLp'{m — p)/m\. An EB estimator is ob- 
tained by substituting the same estimator B^^ 



B 
BAL 
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V{m -p- 2)/S or {B^^)+ = mm{B^^ , 1) of B as 
given in Section 2, where we may recall that S = 
\\y — PxyiP- The calculation of the Bayes risk of 
the resulting EB estimator is similar to that in Sec- 
tion 2. The details are omitted. The special case of 
the intercept model where x|^/3 = /i for all i was 
considered in Ghosh, Kim and Kim (2007, 2008). 
These authors also considered constrained Bayes es- 
timators along the lines of Louis (1984) and Ghosh 
(1992a). 

8. SUMMARY AND FUTURE RESEARCH 

The paper reviews several normal theory-based 
small area estimation techniques. In particular, the 
role of shrinkage estimation in the small area context 
is highlighted, and different variants of Stein-type 
shrinkers are discussed. Both hierarchical and em- 
pirical Bayesian methods are presented in the con- 
text of mixed linear models for unbalanced data, 
and are illustrated with specific small area problems. 
Empirical Bayes confidence intervals based on hier- 
archical normal models are provided. Extensions of 
these results to measurement error models and gen- 
eralized linear models are also touched upon. 

There are several promising areas of future re- 
search. As mentioned earlier, small area estimation 
needs explicit, or at least implicit, use of models. 
These model-based estimates can differ widely from 
the direct estimates, especially for areas with very 
low sample sizes. One potential drawback of the 
model-based estimates is that when aggregated, the 
overall estimate for a larger geographical area may 
be quite different from the corresponding direct es- 
timate, the latter being usually believed to be quite 
reliable. This is because the original survey was de- 
signed to achieve specified inferential accuracy at 
this higher level of aggregation. The problem can 
become more severe in the event of model failure as 
often there is no real check for the validity of the as- 
sumed model. Moreover, this overall agreement with 
the direct estimates may sometimes be politically 
necessary to convince the legislators of the utility of 
small area estimates. 

One way to avoid this problem is the so-called 
"benchmarking approach" which amounts to modi- 
fying these model-based estimates so that one gets 
the same aggregate estimate for the larger geograph- 
ical area. A simple illustration is to modify the model- 
based county-level estimates so that one matches the 
state-level direct estimate. Currently the most popu- 
lar approach is the so-called "raking" method which 



involves multiplying all the small area estimates by 
a constant factor so that the weighted total agrees 
with the direct estimate. Clearly, this is an ad hoc 
procedure with very little statistical foundation. 

It appears that constrained Bayes small area esti- 
mates (Louis, 1984; Ghosh, 1992b) will be particular- 
ly appropriate to achieve this end. Instead of match- 
ing the first two moments from the empirical his- 
togram of Bayes estimates with those from the pos- 
terior histogram of the parameters as in Louis (1984) 
or Ghosh (1992a), one should require that the aggre- 
gate or some weighted aggregate of these small area 
estimates should equal the large area aggregate es- 
timate. This can possibly be achieved even for fairly 
complex models. See also Shen and Louis (1998). 

The other interesting issue is to extend the mea- 
surement error model much further so that one can 
even handle discrete data and also more complex 
normal theory models. 
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